High-dimensional kNN joins with incremental updates
نویسندگان
چکیده
The k Nearest Neighbor (kNN) join operation associates each data object in one data set with its k nearest neighbors from the same or a different data set. The kNN join on high-dimensional data (high-dimensional kNN join) is an especially expensive operation. Existing high-dimensional kNN join algorithms were designed for static data sets and therefore cannot handle updates efficiently. In this article, we propose a novel kNN join method, named kNNJoin, which supports efficient incremental computation of kNN join results with updates on high-dimensional data. As a by-product, our method also provides answers for the reverse kNN queries with very little overhead. We have performed an extensive experimental study. The results show the effectiveness of kNNJoin for processing high-dimensional kNN join queries in dynamic workloads.
منابع مشابه
افزایش سرعت نگهداری افزایشی دید با استفاده از الگوریتم فاخته
Data warehouse is a repository of integrated data that is collected from various sources. Data warehouse has a capability of maintaining data from various sources in its view form. So, the view should be maintained and updated during changes of sources. Since the increase in updates may cause costly overhead, it is necessary to update views with high accuracy. Optimal Delta Evaluation method is...
متن کاملEfficient K-Nearest Neighbor Join Algorithms for High Dimensional Sparse Data
The K-Nearest Neighbor (KNN) join is an expensive but important operation in many data mining algorithms. Several recent applications need to perform KNN join for high dimensional sparse data. Unfortunately, all existing KNN join algorithms are designed for low dimensional data. To fulfill this void, we investigate the KNN join problem for high dimensional sparse data. In this paper, we propose...
متن کاملMLSD: A Network Topology Discovery Protocol for Infrastructure Wireless Mesh Networks
In Infrastructure Wireless Mesh Networks (IWMN), the network topology discovery protocol has an essential role for responding proactively and promptly to topology modifications. It is responsible for disseminating link state updates, managing the tension between update frequency and number of messages, which has strong impact in protocol performance, network resource consumption and scalability...
متن کاملProcessing Sliding Window Multi-Joins in Continuous Queries over Data Streams
We study sliding window multi-join processing in continuous queries over data streams. Several algorithms are reported for performing continuous, incremental joins, under the assumption that all the sliding windows fit in main memory. The algorithms include multiway incremental nested loop joins (NLJs) and multi-way incremental hash joins. We also propose join ordering heuristics to minimize th...
متن کاملLSH At Large - Distributed KNN Search in High Dimensions
We consider K-Nearest Neighbor search for high dimensional data in large-scale structured Peer-to-Peer networks. We present an efficient mapping scheme based on p-stable Locality Sensitive Hashing to assign hash buckets to peers in a Chord-style overlay network. To minimize network traffic, we process queries in an incremental top-K fashion leveraging on a locality preserving mapping to the pee...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- GeoInformatica
دوره 14 شماره
صفحات -
تاریخ انتشار 2010